Do LLMs Exhibit Human-Like Bayesian Judgment and Moral Inconsistency? A Replication Study of Cao et. al., (2019)

Nykko Vitali

Spoiler

We see that models are able to do Bayesian Reasoning, and they condemn those who have made the same Bayesian Judgment as them - just like humans.

Let’s explore how this unfolds.

But first…

A Quick Primer: Bayes’ Theorem Explained

The Formula

Bayes’ theorem is expressed mathematically as:

\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]

Where:

  • \(P(A|B)\) = Posterior probability - The probability of event A given that B has occurred
  • \(P(B|A)\) = Likelihood - The probability of event B given that A has occurred
  • \(P(A)\) = Prior probability - The initial probability of event A
  • \(P(B)\) = Normalizing constant - The total probability of event B

In Plain Language

Bayes theorem helps us update our beliefs when we receive new evidence:

  1. Start with what we initially believe (prior)
  2. Consider how likely we would observe our evidence if our belief is true (likelihood)
  3. Normalize by the overall probability of seeing this evidence (normalizing constant)
  4. End with our updated belief (posterior)

Bayesian Updating Visualized

Example 1: Medical Diagnosis

The Classic Doctor Problem

A medical test for a rare disease has the following properties:

  • 1% of the population has the disease (prior)
  • The test is 95% accurate for positive cases (sensitivity)
  • The test is 90% accurate for negative cases (specificity)

If a patient tests positive, what is the probability they have the disease?

Bayes Theorem Setup

Let’s define our variables:

  • \(D\) = Patient has the disease
  • \(T\) = Test is positive

We want to find: \(P(D|T)\) = Probability of disease given positive test

Using Bayes formula: \[P(D|T) = \frac{P(T|D) \times P(D)}{P(T)}\]

The Calculation

\[P(D|T) = \frac{P(T|D) \times P(D)}{P(T|D) \times P(D) + P(T|\neg D) \times P(\neg D)}\]

Substituting our values:

  • \(P(D)\) = 0.01 (1% have the disease)
  • \(P(T|D)\) = 0.95 (95% sensitivity)
  • \(P(T|\neg D)\) = 0.10 (10% false positive rate)
  • \(P(\neg D)\) = 0.99 (99% don’t have the disease)

\[P(D|T) = \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.10 \times 0.99} = \frac{0.0095}{0.0095 + 0.099} = \frac{0.0095}{0.1085} \approx 0.088\]

Only about 8.8% of people who test positive actually have the disease!

Why This Is Surprising

Even with a 95% accurate test, most positive results are false positives when the condition is rare!

Example 2: Predictive Text

Bayesian Reasoning in AI

When your phone predicts the next word, its using Bayesian reasoning:

\[P(word|context) = \frac{P(context|word) \times P(word)}{P(context)}\]

Where:

  • \(P(word)\) = Prior probability of the word (how common it is)
  • \(P(context|word)\) = Likelihood of seeing this context if the word follows
  • \(P(context)\) = Probability of seeing this context with any word
  • \(P(word|context)\) = Posterior probability of the word given the context

Example in Action

Imagine typing: “I need to go to the”

What is the probability the next word is “store” vs. “hospital”?

Why This Matters

Just like with the medical example, Bayesian reasoning helps models:

  1. Balance general knowledge (priors) with contextual evidence (likelihood)
  2. Make better predictions by combining multiple factors
  3. Avoid overconfidence based on limited evidence
  4. Update beliefs appropriately as more context is provided

This is how AI systems can make reasonable predictions about what you will say next!

What is a Large Language Model (LLM)?

  • AI systems trained on vast amounts of text data
  • Learn patterns in language to predict what comes next
  • Can generate human-like text, answer questions, translate languages, etc.
  • Examples: GPT-4, Claude, Gemini, etc.

How do LLMs work?

The Embedding Process

The Attention Mechanism

  • Allows the model to focus on relevant words regardless of position
  • Each token “pays attention” to all other tokens in varying degrees
  • Darker colors show stronger connections between words
  • Multiple attention “heads” capture different types of relationships

Next Token Prediction

  • LLMs predict the most likely next word based on context
  • Probability distribution over the entire vocabulary
  • The model selects the most probable token or samples from the distribution
  • This process repeats to generate coherent text

LLM Training Process

  • Pre-training on trillions of words
  • Self-supervised learning: predict masked or next tokens
  • Fine-tuning to improve specific capabilities
  • Alignment with human values and preferences (RLHF)

Company Philosophies

OpenAI (GPT-4o)

  • Founded in 2015 by Sam Altman, Greg Brockman, and others
  • Mission: “Ensure that artificial general intelligence benefits all of humanity”
  • Approach: Developing cutting-edge AI technology with broad accessibility
  • Focus: Speed, multimodal capability, and user-friendly interfaces

Anthropic (Claude 3.7)

  • Founded in 2021 by Dario and Daniela Amodei (former OpenAI researchers)
  • Mission: “Building reliable, interpretable, and steerable AI systems”
  • Approach: Constitutional AI framework with strong focus on safety
  • Focus: Ethical AI development and helpful, harmless, honest outputs

GPT-4o vs Claude 3.7 Sonnet

  • Released: August 2024
  • Training data: Up to October 2023
  • Context window: 128K tokens
  • Strong multimodal capabilities

  • Released: February 2025
  • Training data: Up to April 2024
  • Context window: 200K tokens
  • Text generation and Image interpretation
  • Constitutional AI

Norm Setting

  • Human data will be indicated by red titles
  • GPT-4o’s data will be indicated by green titles
  • Claude Sonnet 3.7’s data will be indicated by orange titles

Study 1

Study 1 Instructions

A man recently performed surgery on a patient.

A woman recently performed surgery on a patient.

Which of the following statements do you agree with?

The man is less likely to be a doctor than the woman.

The man and the woman are equally likely to be a doctor.

The man is more likely to be a doctor than the woman.

Respond with your choice by repeating the statement you agree with.

Who Do You Think Is More Likely?

Humans vs LLMs

How Do You Judge Another?

Immoral Judgments

Incompetent Judgments

Overall Judgments

Human vs LLM Text Analysis

Means & SEs

Human & LLM Cao (N=199) GPT-4o (N=90) Claude (N=90)
Own Judgement 93% Equally Likely
7% Man More Likely
100% Equally Likely 100% Equally Likely
Judgment of Other Immoral: 5.81 (SE=0.1)
Incompetent: 5.64 (SE=0.11)
Overall: 5.72 (SE=0.1)
Immoral: 6.22 (SE=0.04)
Incompetent: 6.11 (SE=0.03)
Overall: 6.17 (SE=0.04)
Immoral: 7 (SE=0)
Incompetent: 7 (SE=0)
Overall: 7 (SE=0)